An English-Chinese Cross-lingual Word Semantic Similarity Measure Exploring Attributes and Relations

نویسندگان

  • Lin Dai
  • Heyan Huang
چکیده

Word semantic similarity measuring is a fundamental issue to many NLP applications and the globalization has made an urgent request for cross-lingual word similarity measure. This paper proposed a word semantic similarity measure which is able to work in cross-lingual scenarios. Basically, a concept can be defined by a set of attributes. The basic idea of this work is to compute the similarity between words by exploring their attributes and relations. For a given word pair, we first compute similarities between their attributes by combining distance, depth and relation information. Then word similarity are computed through a combination scheme. The algorithm is implemented based on an English-Chinese bilingual ontology HowNet. Experiments show that the proposed algorithm results in high correlation against human judgments, which encourages its broad application in cross-lingual applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks

This paper suggests a method for detecting cross-lingual semantic similarity using parallel PropBanks. We begin by improving word alignments for verb predicates generated by GIZA++ by using information available in parallel PropBanks. We applied the Kuhn-Munkres method to measure predicateargument matching and improved verb predicate alignments by an F-score of 12.6%. Using the enhanced word al...

متن کامل

Chinese-English Bilingual Word Semantic Similarity Based on Chinese WordNet

Semantic similarity measurement of multilingual words is a challenging problem in data mining, information extraction, information retrieval, etc. This paper introduces an algorithm to measure the semantic similarity of Chinese-English bilingual words based on Chinese WordNet, an expansion of WordNet in Simplified Chinese. The algorithm not only measures the semantic similarity for Chinese and ...

متن کامل

OWNS: Cross-lingual Word Sense Disambiguation Using Weighted Overlap Counts and Wordnet Based Similarity Measures

We report here our work on English French Cross-lingual Word Sense Disambiguation where the task is to find the best French translation for a target English word depending on the context in which it is used. Our approach relies on identifying the nearest neighbors of the test sentence from the training data using a pairwise similarity measure. The proposed measure finds the affinity between two...

متن کامل

NTHU at NTCIR-10 CrossLink-2: An Approach toward Semantic Features

This paper describes the approaches of NTHU in the NTCIR-10 Cross-Lingual Link Discovery task, also named CrossLink-2. In this task, we aim to discover valuable anchors in Chinese, Japanese or Korean (CJK) articles and to link these anchors to related English Wikipedia pages. To achieve the objective, we do not only depend on Wikipedia’s distinguishing features (e.g. anchor links information an...

متن کامل

Cross-Lingual Syntactically Informed Distributed Word Representations

We develop a novel cross-lingual word representation model which injects syntactic information through dependencybased contexts into a shared cross-lingual word vector space. The model, termed CLDEPEMB, is based on the following assumptions: (1) dependency relations are largely language-independent, at least for related languages and prominent dependency links such as direct objects, as evidenc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011